Vacouver Street Trees

Exploratory Data Analysis | Pankti Shah | December 2021



Exploratory Data Analysis

There are ~600 street addresses, ~160 species_name, 22 neighbourhood in Vancouver, and 62 genus names in the dataset. There are ~5000 data points, but missing quite a few values in date_planted. Mean diameter of a tree is ~12.2, and height is 2.7. Datatypes range from datetime64, 3 float64 type, 3 int64 type, and 12 objects.

Following columns will be dropped from the analysis: std_street, on_street, civic_number, tree_id, cultivar_name. These are dropped because they do not provide any valuable information; information is irrelevant for any sort of general data exploration.

Exploratory graphs will be made for categorical and numerical datatypes. I am hoping to explore tree diversity in Vancouver through species, plant_area, spread of species across neighbourhood, age of the tree (date_planted). I also want to explore trees' height and diameters over the years and how are trees spread across the city (ie., number of trees planted on even vs odd side of the street and neighbourhoods).

For the purpose of this data exploration, no treatment will be done to the null values.

Evaluate distributions of all numerical columns using repeat()

Next, we will explore the same quantitative data using heat maps to gain better insights in relationships.

Diagonal rows in the matrix are linear as they are being plotted against itself.

We see the greatest number of tree counts have diameter <10 and height between 1 - 1.5. Plotting latitude and longitude with height and diameter on a map where we can associate location with neighbourhood will be more useful. Before we progress this, let's quickly see distribution of categorical columns

Quick insights from categorical data exploration above:

Next, we will explore some quantitative data using a map visualization. This is to gain insights about which neighbourhood has higher diameter and height averages.

We can see on average north west end of the city has more trees that have greater diameter than south eastern. West point grey neighbourhood that the most number of trees with greater diameter on average. Next we will develop map to visualize tree's height distribution.

Following neighbourhoods have higher number of trees with greater tree heights on average: West Point Grey, Kitsilano, Fairview, West End, Mount Pleasant, Shaunghnessy, and Dunbar-Southlands. Generally, west end of the city have greater number of trees with height averaging ~3.0. Elsewhere, height average is ~2.0, which is quite a bit shorter.

Next, we will explore distribution of species across various neghbourhood to determine which species are more popularly found.

Following specifies are found across all neighbourhood: Acerifolia X, Americana, Betulus, Campestre, Cerasifera, Euchlora X, Freemani X, Kobus, Platanoides, Rubrum, Serrulata. Using this information, i would be valuable to further explore biodiversity across each neighbourhood and average age of the tree.

Next, let's see if there is any relationship between age of the tree (year it is planted) vs its height.

Bar graph above doesn't provide too much insights. Generally though, height of a tree is greater in older trees. Most number of trees planted in 2007.

Next, we will explore number of trees planted per neighbourhood.

From the data we see following neighbourhood is relatively newer because most number of trees were planted in recent years: Sunset

Following are generally older neighbourhood as most trees were planted prior to 2000s: Hastings-Sunrise, Killarney

Based on the data, we can see generally most neighbourhood are consistent with some degree of tree plantation across the years. However, some areas like Fairview, Strathcona, and West End have been quite inconsistent. Otherwise generally on average 1-2 trees are planted every year in each neighbourhoods.

Next, we will explore height and diameter of trees through density plots.

Across Vancouver, we have trees with a variety of height range from 1 - 9 m. Most trees are concentrated with height range of 2. Whereas, diameter of the trees peak at ~12 cm for the majority of the trees. It would be interesting to evaluate popular height to diamater ratio and how that ranges over the years.

Concluding Remarks

In this document various plots were made for exploring the data, which includes distribution plots, heat map matrix for quantitative and qualitative data. Quick distirbution plots were made to explore species distribution in various neighbourhoods. To better understand height and diameter distriutions, density plots were made. In addition, bar plot showing number of trees planted each year with their height distribution was developed.

Couple things that I will consider for analysis is removing all the rows that have any null values to ensure consistent data is used across. Also, through these exploratory data analysis I am interested in diving more into relationship between height and diameter of the tree with when the tree was planted. Also, I was to gain more insights about biodiversity of tree species across Vancouver neighbourhood. I will be using both the Vancouver city maps that shows distirbution of height and diameter. For other plots, I will be modifying existing plots to provide more clear picture to audience and enables to tell a clear compelling story. For example, I will plot average diameter and height versus year they were planted to see whether older trees have higher diameter and height versus younger trees. I will also develop heat map of popular species and the neighbourhood they reside in.

Following columns will not be used in further analysis: Genus name, assigned, common_name and plant_area as they don't provide relevant information for the questions I am exploring.